Reward Design for Multi-Agent Reinforcement Learning with a Penalty Based on the Payment Mechanism

نویسندگان

چکیده

In this paper, we propose a novel method of reward design for multi-agent reinforcement learning (MARL). One the main uses MARL is building cooperative policies between self-interested agents. We take inspiration from concept mechanism game theory to modify how agents are rewarded in algorithms. defined payment that reflects negative contribution other agents’ valuation same manner as Vickrey-Clarke-Groves (VCG) mechanism. give individual agent signal consists two elements. evaluated solely on basis behavior will follow greedy and selfish policy, penalty reflect social welfare. call scheme based (RDPM). experimented with RDPM different scenarios. show can increase utility among while designs achieve far less, even basic simplistic problems. finally analyze discuss affects policy.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Plan-based reward shaping for multi-agent reinforcement learning

Recent theoretical results have justified the use of potential-based reward shaping as a way to improve the performance of multi-agent reinforcement learning (MARL). However, the question remains of how to generate a useful potential function. Previous research demonstrated the use of STRIPS operator knowledge to automatically generate a potential function for single-agent reinforcement learnin...

متن کامل

Abstract MDP Reward Shaping for Multi-Agent Reinforcement Learning

MDP Reward Shaping for Multi-Agent Reinforcement Learning Kyriakos Efthymiadis, Sam Devlin and Daniel Kudenko Department of Computer Science, The University of York, UK Abstract. Reward shaping has been shown to significantly improve an agent’s performance in reinforcement learning. As attention is shifting from tabula-rasa approaches to methods where some heuristic domain knowledge can be give...

متن کامل

Decentralized multi-agent reinforcement learning in average-reward dynamic DCOPs

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in oth...

متن کامل

Potential-based reward shaping for knowledge-based, multi-agent reinforcement learning

Reinforcement learning is a robust artificial intelligence solution for agents required to act in an environment, making their own decisions on how to behave. Typically an agent is deployed alone with no prior knowledge, but if given sufficient time, a suitable state representation and an informative reward function is guaranteed to learn how to maximise its long term reward. Incorporating doma...

متن کامل

Autonomous Learning of Reward Distribution for Each Agent in Multi-Agent Reinforcement Learning

A novel approach for the reward distribution in multi-agent reinforcement learning is proposed. The agent who gets a reward gives a part of it to the other agents. If an agent gives a part of its own reward to the other ones, they may help the agent to get more reward. There may be some cases in which the agent gets more reward than that it gave to the other ones. In this case, it is better for...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of The Japanese Society for Artificial Intelligence

سال: 2021

ISSN: ['1346-0714', '1346-8030']

DOI: https://doi.org/10.1527/tjsai.36-5_ag21-h